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ABSTRACT 

An iterative algorithm for the efficient solution of systems of nonlinear hyperbolic 
equations is presented. Parallelism is evident at several levels. In the formation of 
the iteration, the equations are decoupled, thereby providing large grain parallelism. 
Parallelism may also be exploited within the solves for each equation. Convergence of 
the iteration is established via a bounding function argument. Experimental results in 
two- dimensions are presented. 


1 Research conducted at ICASE, NASA Langley Research Center, Hampton, Virginia, supported by 
NASA Contract No. NAS1-18605. 
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1. INTRODUCTION. An iterative algorithm suitable for the solution of a sys- 
tem of nonlinear hyperbolic partial differential equations in multiple dimensions is dis- 
cussed. Convergence is established analytically in the continuous case when the solu- 
tions are smooth; however, numerical experiments with a system of nonlinar conser- 
vation laws demonstrate that convergence is achieved in the discrete case even when 
the solutions have discontinuities (shocks). This method is an extension of an iterative 
method for one- dimensional scalar equations [11, 6] to systems of equations in multiple 
dimensions. 

This method decouples the PDEs by linearizing the convection coefficient for a 
space-time domain. This provides opportunity to exploit large-grain parallelism. In 
addition, the linearization allows the treatment of some terms in the equations as source 
theorems, providing more freedom to choose from a wider variety of numerical methods. 
This could be exploited, for example, when extending the algorithm that couples the 
method of characteristics with cellular automata [8] from a scalar equation to a system 
of equations [7]. Cellular automata and the method of characteristics are both sources 
of parallelism. 

The iteration is presented in Section 3. Convergence of the method is established 
analytically in Section 4, and numerical experiments demonstrate that the solution 
improves by more than a digit in each iteration in Section 6. 

2. PROBLEM. Consider the following initial-boundary value problem 

(1) ^-t-'£F i (t,x,U)^- + R[t,x,U)U = 0, 

where U = ...,Ur„) is a vector of m > 1 components. Assume that the spatial 

domain can be specified by a reeil valued function d(x ) as II = {x | d(x) < 0}. The 
range of the temporal variable is 0 < t < T, and the complete domain is denoted 
Q = (0,T) x II = {(i, x) | A(f,x) < 0 }, where A(x,f) = td(x). The coefficients R and 
Fi, for i = 1 to 7i, are matrix-valued functions in ]R mXn . 

Denote by Fij the j th row of the matrix Fi and denote the I th component of F{j by 
fijjL. Similarly, r J( j denotes the component of R that is in the j th row and I th column. 
The data 

(2) Uj(t,x) = aj (t,x) 

serve as the initial guess when t = 0 and x € II, and as the boundary data for the 
portion of 511 such that 

:= F h: j(t i x,U) • grad(A) <0. 
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( 3 ) 



Here, grad(A) is the gradient of A with respect to the spatial variables. This is anal- 
ogous to the inflow boundary conditions of fluid dynamics. Other types of boundary 
conditions may be possible [2], but are not studied here. 


3. ITERATION. From Equation (1), each component of the solution satisfies 


( 4 ) 


rj'i I . w f)lL • 

x > + r «(*> x > U ) u i = 


n m a m 

.■ l±j OX ' t#j 

Let U k = be the k th iterate. The equation for the j th component is 

linearized by using the most recent iterate only in the places where the the j th component 
is differentiated. This results in the equation 


( 5 ) 


Bu k + l n 8u k+l 

+ £ /W(*. x > U k )^- + x , U k )u k+1 


n m 




duf 

dxi 


Y, r ij( t ’ x > Uk ) u l- 

Vi 


Thus, the equations for each of the iterates have been decoupled. This is analogous to 
the Jacobi iteration for systems of linear equations. Each u k - satisfies (2) for the portion 
of dO, such that ipj(t,x, U k ~ 1 ) < 0. 

Forms of the iteration other than (5) are possible and may be desirable. For exam- 
ple, if parallelism is not important, then it may be advantageous to linearize but not 
decouple the equations. 


4. CONVERGENCE. The convergence of the iteration has its roots in the exis- 
tence and uniqueness proofs of partial differential equations [1, 9, 4]. However, existence 
and uniqueness are not the topics of this paper. The iteration defined by Equation (5) 
is shown to be a contraction provided the domain is restricted to regions where the 
solution is sufficiently smooth and continuous. The numerical experiments of Section 
6 demonstrate that the restriction on the smoothness of U is only for the sake of the 
analysis, and the iteration will converge under less strict conditions. In the statement 
of the theorem, IT is the closure of fl, and 



denotes the L 2 norm in space. 

Theorem 4.1. Assume that F{ and R depend continuously on their arguments . 
If the data ctj of Equation (2) and their first derivatives are continuous and the initial 
guess U° is Lipschitz continuous on £1, then for each k there are constants C k > 0 and 
\ k > 0 

(6) ||u} +1 - u% < - l)||uj - u‘-'|lr 

when 0 < i < T and for l < j < m and finite k > 1, and U k is the k th iterate of 
Equation (5). 

First, the boundedness and continuity of the iterates is established. 

Lemma 4.2. Assume that and R depend continuously on their arguments. 
Suppose the data ctj of Equation (2) and their first derivatives are continuous , and the 
iterate U k is Lipschitz continuous on Q, then U k+1 is Lipschitz continuous for 0 < t < T. 


To establish this lemma the equation governing the iterates will be simplified. Then 
the iterate will be shown to be bounded, and the continuity of the iterates will be 
established. 

Proof. Equation (5) may be written 


( 7 ) 


du k+1 

dt 


+ 9h 


u 


k+1 


+ E *£ 

l=l. 


u, 


= 0 , 


where 

9ij = r u {t,x,U k ) + 

♦=i UXt 

The Lipschitz continuity of U k and the continuity of the boundary data results in the 
coefficient g k j and the summation term in the equation being bounded. Since the 
domain is bounded, iterate U k+1 will also be bounded. 

An equation governing du k+1 /dx t may be derived by taking the partial derivative 
of Equation (7) with respect to Xi to obtain 


( 8 ) 


dt \ dxi J 3,1 \ dxi J 


+ 5 — 0, 


where the source term 5 and the coefficient g k j of this equation are functions of u k+1 
and the values and first derivatives of R, Fi, and U k . The boundedness of u k+1 has been 
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established, and the boundedness of the remaining terms follow from the assumptions 
made for this lemma. Thus, the boundedness of S and follow. From the boundedness 
of these terms the boundedness of du k+1 /dxi (and hence the Lipschitz continuity of 
u* +1 ) follow. □ 

To establish Theorem 4.1, an equation governing the difference between iterates 
is derived. Since the iterates are Lipschitz continuous and the domain is bounded, 
appropriate upper and lower bounds for U k+1 will be determined. Then the difference 
is shown to be bounded with solutions of ordinary differential equations. 

Proof (of Theorem 4.1). The components of the difference Z k = (z k , where 
z k = u* +1 — u J are governed by the equation 

3 3 3 


( 9 ) 


ir + S> 

oi 


k fc— 1 

u z i 


+ (A-flfr 1 K - 1 


k „fc 


] + 9j,j z 


j + (4 


k 

3,3 



The initial data is zj( 0, x ) = 0. By the Mean Value Theorem 


k - 1 


9iJ 9ij 


= Z 


k - i 


Gij, 


where 


G k j = V u x,U) + ^2 V u x, &) q x / 

t=l * 


Here, U is some function bounded above and below by max(U k a , U k ) and min{U k 1 ,U ), 
respectively. Thus, Equation (9) may be written as 


( 10 ) 


dz) 

dt 


—L + a k Z k = —Z k ~ l • 
FU ^*3,3*3 


1*3 


k - 1 


-E 


k k - 1 

9lj z l ■ 


1*3 


Using the lemma, the absolute value of the terms on the right hand side of Equation 
(10) are bounded by C}\\Z k ~ l \\ t for some constant C k . In addition, the coefficient of 
z k is bounded above (below) by some constant \ k ( A* 5 ) . Thus, the functions 

TJj = tCj\\Z k ~ 1 \\T^ t 


u3j_=tC) WZ^Wre^ 

are upper and lower bounds, respectively, for ||z*||. With C k = maxj C- and A = 
maxj A*, the theorem is established. □ 
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Theorem 4.1 did not establish the boundedness of C k and A* as k goes to infinity. If 
these constants did grow unbounded for large k y then the size of the temporal partition 
for which the iteration would converge will tend to zero. In practice, the iteration 
converged in a finite number of iterations, and the effect of unbounded growth in C k 
and \ k was not experienced. When these constants are assumed to be bounded, as 
in the corollary below, the convergence of the iteration follows immediately from the 
theorem. 

Corollary 4 . 3 . Assume that and R depend continuously on their arguments . 
Also Assume that the data otj of Equation (2) and their first derivatives are continuous 
and the initial guess U° is Lipschitz continuous on TT. If there are bounds C > C k and 
A > A for all k >= 1, then then the iteration defined by (5) converges to a solution of 
( 1 ) • 

Proof. From Theorem 4.1, 


ll“* +1 - “‘IIt < C(«“ - l)||uj 


It is possible to choose a temporal bound T such that C(e tx - 1) < 1 when T <T, and 
the iteration is a contraction. 

Since the iteration is a contraction, it must converge to a fixed point U°° = U k = 
U k+1 . Putting U°° into Equation (5), clearly U = U°° is a solution to (l)-(2). □ 


Thus, there are reasonable conditions under which the iteration will converge. 
Stronger results would likely be possible, especially considering that the experiments 
presented later in the paper do not satisfy all of the conditions of the analytic results. 

5. ALGORITHM. An algorithm based on the iteration is presented here. The 
algorithm is a combination of the iteration presented in Section 3 with a temporal 
partitioning and a stopping criteria. 

The temporal variable may need to be partitioned into several regions. For example, 
it is possible for the method to diverge if T is too large. This is manifested by the 
coefficient C k (e tX — 1) of Equation (6) in Theorem 4.1 becoming larger than unit. In 
addition, the iteration requires that the solution be stored for the entire temporal region, 
possibly requiring too much memory. These problems are resolved by partitioning time 
into Q sections 0 < To < T\ < ... < Tq = T. The iteration will be performed on the 
partition fl, = ( T q _i,T q ] x II of the domain, using the solution at time t = T q _ x from 
the iteration on as the initial condition. 

The stopping criteria is based on the norm of the difference between iterates at time 

t = T q . When the norm is less than some user specified tolerance, then the iteration is 
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assumed to have converged. 


I. Initialize. 

A. Set temporal partition counter to q = 1. 

B. Apply initial data (2) to the solution. 

II. Determine the solution on temporal partition q. 

A. Doall j = 1 to n (j is the index for the elements of the solution). 

1. Determine initial guess u° for ( T q _i,T q ) x II. 

B. Initialize iteration counter k — 1. 

C. Doall j = 1 to n (do one iteration). 

1. Solve Equation (5) to obtain u k then 

2. compute the norm of the difference K k = ||u* — u k 1 ||. 

D. If K k = Ej Kj > TOL 

1. Then increment k and go to Step C. 

2. Else finished with this temporal partition. 

a. Restart by using the solution at t — T,_ i as initial conditions. 

b. Increment q and goto Step II. 

Algorithm 1 Iteration with Restart. 

The algorithm is independent of the the discretization used in Step II.C.l and 
of the initial guess used in Step II. A. 1 (see Algorithm 1). Since the coefficients have 
nonlinear behavior, discretizations should be chosen that are appropriate for nonlinear 
problems. Notice that if an explicit discretization is used with a temporal partition that 
contain only a single time step, then the algorithm is no different than applying the 
explicit discretization directly to the linearization of Equation (1). Thus, each temporal 
partition should contain at least two time steps. 

There is potential to exploit parallelism on several levels. Large-grain parallelism 
is available through the decoupling of the equations, and may be further enhanced 
by using domain decomposition techniques (see [10]). Smaller-grain parallelism may 
be exploited by using appropriate data structures for each of the subdomains [3, 5], 
and by choosing an appropriate numerical scheme to solve the PDEs. Exploitation of 
parallelism is an area of further research. 

6. NUMERICAL EXPERIMENTS. In this section, Algorithm 1 is used to 
solve a system of nonlinear hyperbolic equations. These numerical experiments are a 
demonstration of the convergence of the algorithm. The implementation of the algo- 
rithm has not been optimized or parallelized; thus only the convergence results are 
presented. Experiments also examine the effects of the size of the temporal partition 
on the number of iterations. All the experiments were performed on an Ardent Titan, 

and required less than 5 minutes execution time for each run. 
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The equations solved are a two-dimensional form of Burgers’ equation 


(11) u t + uu x + vu v = 0 

(12) V t + uv x + vv y = 0, 

where time has the range 0 < t < 1 and the spatial domain is the unit cube II = 
{(*,y) | 0 > A (x,y) = t(max[(x — 1/2 ) 2 ,(y — 1/2) 2 ] — 1/4)}. The equations are 
capable of simulation of some of the physical phenomena that arise in computational 
fluid dynamics. Namely, with initial and boundary data 

u = 2.0(.5 - y), 
v = .1 -f .9x, 

Equations (11-12) will be used to model shocks. The data are applied for t = 0 and for 
the inflow portion of II which is defined by y = 0, y = 1, and x = 1. The solution is 
smooth initially, and develops a shock (see Figures 1-2). Thus, the examples satisfy all 



Fig. 1. U at t = .8. 

of the conditions of the theoretical results for the initial partition fix, and evolves into 
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Fig. 2. V at t = .8. 


a nonsmooth function that violates these conditions. Thus, the experimental results 
will demonstrate that the algorithm works even when some of the conditions used to 
established the theory are violated. 

The experiments were performed ir-, mg two different explicit discretizations, a 
second-order MacCormack scheme [2], and a first-order upwind scheme. The upwind 
scheme consists of applying a backward or forward difference based on the sign of the 
coefficient of the spatial derivative being differenced. 

To improve computational efficiency, the discrete analog of the L\ norm at time 
t — Tq is used in place of the L 2 norm in Step II.C.2. Thus, 


(13) 


= 4 1 K, - “l; 1 

r P =i 


> 


where P is the number of spatial grid points and u* p is the discrete value of component 
j of iterate k. There were 25 unknowns in each direction including boundary points; 
thus, the total number of points was P = 625. The time step was At = .002. 

The initial guess on fl, is 


u°(t,x,y ) = u fc (r,_i,x,y) 
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and 


v°(t,x,y) = v k (T q _i,x,y), 

where (u k ,v k ) was the last iterate for partition This resulted in a good initial 

guess as measured by the norm K l of the difference between the initial guess and the 
first iterate (see Table 1). - 


Table 1. Average Number of Iterations 


Number of 
Time Steps 

Upw 

K' 

ind 

Num Iter 

MacCoi 

K 1 

■mack 
Num Iter 

20 

9.94 x 10~ 3 

3.8 

2.93 x 10" 2 

4.1 

50 

2.43 x 10~ 2 

4.7 

7.12 x 10- 2 

6.7 

100 

4.69 x 10- 2 

5.6 

1.38 x lO" 1 

8.6 


The results in Table 1 show the effects of varying the temporal partition size on 
the convergence of the method with the initial guess as described above. The data axe 
the average over the total number of temporal partitions. Thus, corresponding to the 
number of time steps of 20, 50, and 100 arc partition sizes T q — T q _i = .04, .1, and .2, 
respectively. The results in the Number of Iterations columns are the data averaged over 
all of the temporal partitions required to take the 500 time steps necessary to obtain the 
solution at T = 1. This means that the numbers are averaged over 50, 10, and 5 values 
for the rows with 20, 50, and 100 number of time steps, respectively. Experiments 
using a constant initial guess confirm that the additional iterations required for the 
larger partitions are not due to the larger beginning norm K 1 . Further work in this 
area is necessary to establish the reason for this trend; however, the larger computational 
domain resulting from the larf' temporal partition is a likely cause. 

Linear convergence is demonstrated by the data from numeric. - .! experiments in 
Table 2. A steady shock has formed in the solution by time t = .8; thus, the data in the 

Table 2. Reduction in Norm for all iterations 


Iteration 

2 

Num Stopped 
0 

Ave Reduction 
6.99 x 10" 2 

3 

3 

5.48 x 10" 2 

4 

1 

5.87 x 10" 2 

5 

3 

4.09 x lO" 2 

6 

3 

2.88 x 10" 2 


table reflect that the reduction in the difference was still linear when the solution has 
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large gradients that violate some of the conditions of the theorems. The tolerance TOL 
for the stopping criteria was set to 2 x 10“ 7 , slightly lower than machine precision. The 
iteration on each partition used the TOL stopping criteria. This resulted in iterations 
on various temporal partitionings stopping at different iterates. The number of iterates 
on a partition that stopped at the iteration number in the left column is reported in the 
5 Num Stopped 5 column. The average reduction column is thus the relative reduction 
K k /K k ~ l as measured over the number of times the iteration made it that far. 

The experiments confirm and go beyond the analytic results. The reduction in the 
error for each component of the solution in each iteration is monotonic, as predicted 
by Theorem 4.1 was confirmed. The experiments went beyond the analytic results by 
showing a linear reduction of slightly more than a digit of accuracy on a model problem 
with a shock. Larger temporal partitions required more iterations. This indicates that 
the larger partitions will require more computations, but may require fewer synchro- 
nizations. To reach any conclusions, this issue could be studied once the algorithm has 
been implemented in a parallel processing environment. 

7. CONCLUSION. An iterative algorithm for the efficient solution of systems of 
nonlinear hyperbolic equations has been discussed. The method is simple to implement, 
and has parallelism that can be exploited on several levels. In this paper, convergence 
was established analytically for continuous solutions. Numerical experiments demon- 
strated the monotone convergence predicted by the analysis, both when the solution 
was smooth, and when the problem had shocks resulting from the nonlinearities. 
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